UForm-Gen2-dpo is a small generative vision-language model, aligned for image caption generation and visual question answering tasks through Direct Preference Optimization (DPO) on VLFeedback and LLaVA-Human-Preference-10K preference datasets.
Image-to-Text
Transformers English